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Abstract —Privacy of the outsourced data is one of the major 
challenge.Insecurity of the network environment and untrust¬ 
worthiness of the service providers are obstacles of making 
the database as a service.Collection and storage of personally 
identifiable information is a major privacy concern.On-line public 
databases and resources pose a significant risk to user privacy, 
since a malicious database owner may monitor user queries and 
infer useful information about the customer.The challenge in data 
privacy is to share data with third-party and at the same time 
securing the valuable information from unauthorized access and 
use by third party.A Private Information Retrieval(PIR) scheme 
allows a user to query database while hiding the identity of 
the data retrieved.The naive solution for confidentiality is to en¬ 
crypt data before outsourcing.Query execution,key management 
and statistical inference are major challenges in this case.The 
proposed system suggests a mechanism for secure storage and 
retrieval of private data using the secret sharing technique.The 
idea is to develop a mechanism to store private information with 
a highly available storage provider which could be accessed from 
anywhere using queries while hiding the actual data values from 
the storage provider.The private information retrieval system is 
implemented using Secure Multi-party Computation(SMC) tech¬ 
nique which is based on secret sharing. Multi-party Computation 
enable parties to compute some joint function over their private 
inputs.The query results are obtained by performing a secure 
computation on the shares owned by the different servers. 

Keywords: Database.Data storage.private Information Re- 
trieval,Query Processing, Shamir’s Secret Sharing,Secure 
Multi-party Computation 

I. Introduction 

Secure storage of confidential data and their private retrieval 
are major research challenges, when the data are outsourced 
to a third party untrusted service provider.Private Information 
Retrieval(PIR) allows clients to retrieve data from a database 
server in a privacy-preserving manner.PIR schemes make 
use of cryptographic protocols to safeguard the privacy of 
database users. This allow clients to retrieve records from 
public databases, while the identity of the retrieved records 
is completely hidden from database owners. The major goal is 
that the database server should be able to respond to client 
queries without learning any information about the records 
retrieved. 
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A trivial solution is to encrypt the database m using 
cryptographic techniques.But for the query processing, the 
entire database must be downloaded and queries must be issued 
locally.The query execution over encrypted data is a major 
research challenge.Most of the solutions are inefficient due 
to the large query processing time and complexities involved 
in key management. The use of encrypted database is clearly 
information-theoretic ally secure and the server cannot learn 
which record the client seeks, but the key management, time 
consuming encryption decryption process, overhead in large 
encrypted database downloading and the difficulties involved 
in query processing make the scheme impractical. 

Fragmentation is another solution for providing confiden¬ 
tiality of the outsourced data m .The data owner partitions 
the tables horizontally or vertically and distribute them to 
different servers.Encryption is unavoidable in this case also 
because fragmentation cannot preserve the confidentiality of 
a single attribute.Collusion between servers is also a security 
issue.Agarwal et al [l] use secret sharing technique to provide 
confidentiality.Their solution supports different type of queries 
to run efficiently.But untrusted servers have prior knowledge 
about data distribution or frequency. 

The proposed system suggests a secret sharing method 
for confidentiality in the outsourced data.A relation is split 
into random shares and these shares are send to the dif¬ 
ferent servers.This provides both reliability and security.The 
threshold secret sharing scheme helps the data to be re¬ 
trieved from k number of servers out of n servers where the 
shares are stored.The random shares also provides information 
theoretical security at the cost of additional storage space. 
Query processing and searching is an issue here.An efficient 
mechanism for searching and query processing is also sug¬ 
gested in this paper.lt needs interaction between client and 
different servers.The servers will send the shares which are 
the results of the query.The shares are then combined to form 
the original data. Since the computations are performed on 
shares, it provides a Secure Multi party Computation(SMC) 
environment. 

There are several situations in which mutually distrustful 
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parties need to perform a joint computation without revealing 
their inputs to each other. This happens, for example, during 
auctions, voting, negotiations and business analytics. The prob¬ 
lem is how to perform such a computation without revealing 
the inputs.SMC iflOl is the solution to such problems.lt permits 
a group of parties to jointly compute a function of their private 
inputs while preserving privacy and correctness of input.Every 
participant will get the result of computation without exposing 
their input. SMC protocol was first introduced by Yao in 
1982 by exploring the famous Millionaire’s problem lfTTl . The 
protocol is secure, if no participant can learn more from 
the description of the public function and the result of the 
computation. 

SMC is accomplished here by using Shamir’s secret sharing 
scheme. In secret sharing, the secret is not single handed, but 
multi-handed so that even if any of the parties involved in the 
computation are malicious, the secret can be reconstructed. A 
verifiable secret sharing scheme is one in which parties can 
verify the validity of the shares for consistency. To handle 
malicious parties involved in any computation, the secret 
sharing scheme needs to be verifiable. 

Development of secret sharing scheme started as a solu¬ 
tion to the problem of safeguarding cryptographic keys by 
distributing the key among n participants and t or more of the 
participants can recover it by pooling their shares. Thus the 
authorized set is any subset of participants containing more 
than t members.This scheme is denoted as (t,n) threshold 
scheme J8). The notion of a threshold secret sharing scheme 
is independently proposed by Shamir lfl3l and BlakleyO in 
1979. Since then much work has been put into the investigation 
of such schemes. Linear constructions were most efficient and 
widely used. A threshold secret sharing scheme is called ideal, 
if the share size is same as the secret size and is perfect, if less 
than t shares give no information about the secret.Blakley’s 
scheme is not perfect while Shamir’s scheme is perfect. Both 
Blakley’s and the Shamir’s constructions realize f-out-of-n 
threshold secret sharing scheme. However,their constructions 
are fundamentally different. 

Shamir’s scheme is based on polynomial interpolation over 
a finite field. It uses the fact that we can construct a polynomial 
of degree f — 1 only if t, data points are given. A polynomial 
f(x) = o a i xl ’ with °o is set to the secret value and the 
coefficients a\ to a f -i are assigned random values in the field, 
is used for secret sharing.The polynomial f(x) is evaluated 
at n different points and each value is given as share to a 
participant. That is , the value /(*) is given to the user i as 
secret share.Here t is considered as threshold. When any t out 
of n users join together they can reconstruct the polynomial 
using Lagrange interpolation with t points and hence obtain the 
secret ao- Any set of t — 1 users cannot gain any information 
about the secret and is a perfect scheme. This scheme is easily 
computable when necessary data is available and it avoids 
single point of failure . Also it increases reliability, security, 
safety and convenience 0. 

The rest of the paper is organized as follows.Related works 
are given in section II.The proposed system and architecture 


are mentioned in section III.Section IV contains an exam¬ 
ple.Conclusions are drawn in section V. 

II. Related work 

The PIR (Private Information Retrieval) was introduced 
by Benny Chor 0 and has already received a lot of 
attention 151 fTT fl lH6ll li9l . The study of PIR is motivated by 
growing concern about the user’s privacy when querying 
a large commercial database . Protocols for PIR 0 and 
Symmetric Private Information Retrieval (SPIR) lll 2| provide 
a limited type of privacy preserving search. In PIR the 
server and clients are involved, where the server has a 
database of n items and the client wants to obtain the item at 
position i without the server learning the value of i. In the 
case of SPIR, it is additionally required that the user does 
not learn any information about other item except the one 
that was requested. These protocols improved the general 
multiparty computation and have sub-linear communication 
and polynomial computational complexity.But still these 
protocols remain inefficient for many practical uses and 
support only simple selection, rather than general query 
capability. 

In database outsourcing, one party possess large amount 
of data, but does not have enough storage at hand for the 
reliable data storage.Many papers address the issues related 
with database outsourcing l4l lfT31l ||6| .The major issue is that 
we have to keep the data confidential from untrusted server 
and it must be retrieved without revealing any info. The 
approaches of iH use encryption systems.The searching 
over encrypted data is time consuming, which need the word 
to be encrypted before searching.Thus the running time of 
the search in these approaches is linear in the number of 
all searchable tokens and the searching become inefficient 
even though it provides better security. This pinpoints the 
issue for trade off between efficiency and strong privacy 
guarantees. Curtmola et al.0 use the idea of inverted indices 
for efficiency gain.They suggest preprocessing of the data by 
the querier and compute inverted indices on search words. An 
untrusted server can learn search pattern over multiple queries 
in this case. 

SMC based on homomorphic public-key encryption is also 
proposed in mm .In this each party distributes encryptions 
of its private inputs to the other parties. The computations are 
performed on this encrypted data. The homomorphic property 
of encryption can be used to achieve a specific functional- 
ity.Authorized set of users can do threshold decryption and 
the final result can be obtained. 

III. Proposed System 

The proposed system suggests a method of storing and 
retrieving private data in a secure and effective manner. The 
private data include personal information, sensitive information 
or unique identification etc. The data storage may be a private 
information storage using cloud database. 
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A. Secure Data storage 

The system does not use any encryption technique.Shares 
corresponds to each relations are generated using Shamir’s 
secret sharing scheme.These shares are then stored on different 
servers. The architecture for data storage is shown in figure |T] 



Fig. 1. Architecture for Data storage 


The architecture has four main modules 

1) Database Owner 

2) Share Generator 

3) Database Hub 

4) Database Servers 

Database owner gives the table schema to Table schema 
handler which is copied into all the database servers. When 
a data is inserted, the data insertion module will give the data 
into share generator module. The share generator converts the 
data into a bytecode form using bytecode generator and then it 
is divided into shares using shamir secret share generator. The 
shares are then distributed and stored in different database 
servers. 

B. Algorithm for data storage 

Stepl: The database schema is copied into n database 
servers 

Step2: When an insert operation is performed, each attribute 
value is divided into shares and stored in the database servers 
D S\=R{A\\,A2\,A^\,- • -Am i) 

DS2=R(Ai2,A22,A32,- ■ -Amf) 

DSn = R(Ai n ,A2 n ,A3 n ,‘ * • A nin ) 
where, DS n is the n th database server 

R(Ai n ,A 2 n,A^ n ,- ■ ■A mn ) is the record containing attribute 
shares of m attributes and A mn is the n th share of m th 
attribute. 

Step3:Along the attribute values a primary key column con¬ 
taining index values starting from 1 will also be created 
automatic ally. The purpose of this is to make the retrieval 
process easy. 

C. Secure Data Retrieval 

The architecture for secure data retrieval is shown in 
figur^2] 


The main modules are 

1) Client 

2) Computation Server 

3) Database Hub 

4) Database Servers 



Fig. 2. Architecture for Secure Data Retrieval 
Stepl:Client gives a query 

Step2:The Query Handler parses the query and extract the 
where condition attribute and passes it to Computation Agent 
(CA) 

Step3:CA request the shares of the condition attribute from 
DB Server manager which forwards the request to all share 
holding databases 

Step4:After getting the attribute shares, CA reconstruct the 
attribute values and check the condition and finds out the index 
values of satisfying attributes 

Step5:Query Handler gives the select attribute name and re¬ 
quest the CA to get the shares corresponding to the index 
values obtained in step 4 

Step6:CA forwards a packet containing the following fields to 
DB Manager and from there to database servers. 

TABLE I 
Packet Format 

| Indexvalue | Select attribute name | Client IP address 

Step7:The DB servers send the requested attribute column 
shares having the specified index values to the provided IP 
address of client 

Step8:The result constructor in the client reconstructs the 
shares to retrieve the actual query result 

IV. Example Scenario 

Consider a hospital database system which contains pa¬ 
tient’s disease records. Since it is a large database, it is 
outsourced in a cloud storage. The database contains sensitive 
information so that the content of the database should not 
be revealed to a third party. And also suppose the hospital 
authority wants to know how many AIDS patients are there 
keeping the anonymity of the patient. In this case, the hospital 
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data is stored in 3 database servers in the form of shares and a 
(2,3) scheme is used. Each database item is formed in to shares 
using shamir’s secret sharing scheme and stored in different 
servers having the same database schema.The database owner 
stores the data in the form of shares and Database Server 
Manager has the details of locations of database servers where 
these shares are getting stored. 

Consider the patient details table as shown in Table [II] Each 

TABLE II 

PATIENT_DETAILS TABLE 


6) Database manager forwards the shares to CA where CA 
gets the Diagonosis column shares from any two of the 
database servers as shown in Table [Vi] 


TABLE VI 

Share from servers 


Diagonosis 

Diagonosis 

mi 

210 

931 

1245 

832 

911 

120 

319 


Patientid 

Patientname 

Doctorid 

Diagonosis 

101 

Ann 

51 

Aids 

102 

Bony 

21 

Cancer 

103 

Cara 

51 

Fever 

104 

Dona 

26 

Aids 


attribute value is divided into 3 shares using shamir’s secret 
sharing scheme and is getting stored in three database servers 
as in Table III Table IV Table [V] respectively. 


TABLE III 
Database Server I 


Index 

Patientid 

Patientname 

Doctorid 

Diagonosis 

1 

189 

1115 

321 

mi 

2 

168 

479 

743 

931 

3 

236 

2314 

209 

832 

4 

247 

641 

659 

120 


TABLE IV 

Database Server II 


Index 

Patientid 

Patientname 

Doctorid 

Diagonosis 

1 

325 

789 

510 

210 

2 

320 

197 

408 

1245 

3 

543 

980 

201 

911 

4 

468 

319 

877 

319 


TABLE V 

Database Server III 


Index 

Patientid 

Patientname 

Doctorid 

Diagonosis 

] 1 

509 

865 

712 

2354 

2 

558 

107 

501 

123 

3 

1024 

954 

646 

986 

4 

479 

310 

981 

912 


7) CA applies langrange interpolation to reconstruct the 
original values for Diagonosis field as shown in Table 

[yin 


TABLE VII 

After Reconstruction 


Diagonosis 

Aids 

Cancer 

Fever 

Aids 


8) CA checks the where condition of the query, that is 
diagonosis=’Aids’ and compute index values 1 and 4 
which satisfies the condition 

9) Query Analyser passes the select attribute name (patient- 
name) to CA and CA forwards the packet containing 
index values, select attribute name, client IP address to 
Database Manager 

10) Database Manager requests the share holding DB servers 
to send the shares of attribute patientname in the speci¬ 
fied indexes to the client IP address. 

11) DB servers pass the shares to the specified IP address of 
the client 

12) The result constructor in the client receives all the three 


shares of patientnames as in Table VIII 


TABLE VIII 

Shares of patientname 


Patientname 

Patientname 

1115 

789 

641 

319 


13) After reconstruction. Client gets the result as in Table 

El 


1) Client generates the query Select Patientname from pa- 
tient details where Diagonosis=’Aids’. 

2) The query is passed to Query Handler (QH) module. 
QH extracts the where portion and take the attribute 
Diagonosis and send to Computation Agent (CA) 

3) CA forwards the attribute name to Database manager 

4) Database Manager request the ’Diagonosis’ column val¬ 
ues from any 2 database servers as per the threshold. 

5) On getting the request each database server replies 
by sending the shares of the requested attribute col- 
umn(Diagonosis) to Database Manager. 


TABLE IX 
Result of Query 

Patientname 

Ann 

Dona 


V. Conclusion 

PIR and secure storage and retrieval of data in untrusted 
servers raise a major security challenge. We presented a secure 
database storage and retrieval system based on secret sharing. 
Since the data is stored as shares in databases, the knowledge 
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of shares will not reveal any clue regarding the original data. 
The query analysis and result reconstruction are performed in 
the client side computation agent which ensures privacy pre¬ 
serving query processing and computation.The system proves 
to be efficient, secure and reliable. The work can be extended 
with unstructured data. The coalition of the service providers to 
retrieve the original data is a major security concern.A secret 
vector which contains the values used to evaluate the secret 
polynomial corresponds to each user can be used, which is 
known only to the clients and hence provides added security 
against untrusted service providers.Simple and efficient XOR 
based secret sharing scheme can be used, if the number of 
servers and threshold is small. 
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